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Tool  Interface  Technology 


Foreword 

The  Technology  Identification  and  Assessment  Project  combined  a  number  of  related  Investi¬ 
gations  to  identity : 

•  existing  technology  in  a  spedfic  problem  area  to  review  reaearch  and  development 
resufts.  and  oommerdaHv  avalabie  oroducts* 

w»  wcnnoogwi  inrougn  regular  revawi  ov  niiircn  ana  aavaiopfiiani  retuat,  pair 
odfc  surveys  of  specMc  areas,  and  IdenWIcailon  of  particularly  good  examples  of  the 
application  of  specMc  technologies; 

•  requirements  tor  new  technology  through  continuing  studes  of  software  development 
needs  wfthin  the  DoD,  and  case  studtos  of  both  successful  and  unsuccessful 
projects. 

Technology  assessment  invokes  understanding  the  software  development  process,  determining 
the  potential  of  new  technology  tor  solving  significant  problems,  evatoating  new  software  tools 
and  methods,  matching  existing  technologies  to  needs,  and  determining  the  potential  payoff  ol 
new  technologies.  Assessment  activities  of  the  project  focused  on  oore  technology  areas  tor 
software  engineering  environments. 

This  rsport  is  one  of  a  series  of  survey  reports.  It  is  not  intended  to  provide  an  exhaustive 
dscusslon  of  topics  pertinent  to  toe  area  ol  user  interlace  technology.  Rather,  I  Is  intended  as  an 
Informative  review  of  the  technology  surveyed.  These  surveys  were  conducted  In  late  IMS  and 
earty  1986. 

Members  of  toe  project  recognized  that  more  general  technology  surveys  have  been  conducted 
by  other  Investigators.  The  project  dto  not  attempt  to  duplcale  those  surveys,  but  focused  on 
points  not  addressed  in  those  surveys.  The  goal  In  conducting  the  SEI  surveys  was  not  to 
describe  the  technology  In  general,  but  to  emphasize  Issues  that  have  either  a  strong  Impact  on 
or  are  unique  to  software  engineering  environments.  Ths  objective  In  presenting  these  reports  is 
to  provide  an  overview  of  toe  technologies 
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that  art  core  to  developing  software  engineering  environments. 

1.  Introduction 

One  of  the  key  areas  in  which  project  members  were  interested  was  tool  interfaoe  technology. 
TNs  report  dtocusses  the  need  tor  tool  interfaces  and  some  of  the  current  trade-offs  in  tool  inter¬ 
face  technology,  emphasizing  the  trade-offs  between  homogeneous  and  heterogeneous  tools. 
By  highlghdng  some  of  the  major  issues,  this  report  reflects  the  state  of  the  technology  today. 

2.  Tool  Interface  Technology 

The  fundamental  goal  of  tool  interface  technology  is  to  make  I  posstole  for  many  independent 
hankvare/software  components  to  share  information.  WhNe  there  are  many  low-level  technol¬ 
ogies  that  allow  the  sharing  of  information  (e.g.,  object  fie  formats  common  among  many 
languages),  the  growing  complexly  of  toolng  and  Information,  and  the  realization  that  coring  is 
but  a  smal  part  of  the  problem  indfcates  that  more  sophisticated  tools  are  needed. 

The  notion  of  software  development  environment  technology  implies  that  information  is  shared  at 
al  levels  —  not  only  at  the  "manufacturing1'  level,  but  also  at  administrative  and  support  levels  — 
and  shared  at  ai  times  during  the  oomplete  product  Me  cycle.  Initial  requirements  specification, 
problem  analysis,  system  design,  oodtog.  testing,  product  deivery  and  dtotribution,  maintenance, 
and  even  obsoleeoence  are  aN  actMbes  that  need  to  share  oompiex  information  in  increasingly 
critical  ways. 

Part  of  the  technological  problem  is  that  many  of  the  tools  currently  employed  at  these  levels  are 
not  designed  to  work  together  toward  a  common  goal.  Word  processingfdocument  processing 
systems  used  in  the  requirements  documents  do  not  create  structures  that  can  be  used  to  trace 
design  decision.  Project  planning  and  management  tools  do  not  have  Interfaces  to  the  actual 
•MMh  wtwWHiQ  Bid  Odvd^puidni  Of  Hid  piwQfPMi  oy  flidM  inpui  od  nid  ptwjdci  pwn. 

impidcndfimon  kjchs  oo  noi  nivd  pfovwom  m  tmo  nvofinMon  dock  ind  propel  rwi^niii 
tools,  e.g.,  project  tracking  by  direct  analysis  of  the  programming  environment  database. 

CVdn  d^uWi  WV  dtefd  dt  ^B»d  pte^dlOn  Wi  WpeiQf  *0d  dOn»d  ptu|dM  ttidndQdliid*W 

tasks  can  best  be  handtod  bv  a  — ■*»~-*~*iaet  eaoabMtv  wMs  the  outnut  from  the  enreartshst 

<mpi  ww^  w  i  wwwww  wy  w  e^nwwimvi  wqrwormji  wvmo  •••  wwi|nev  iiwiii  w*o  sjeeeiwiiwei 

might  then  be  used  to  manipulate  the  project  dependency  graph.  Currently  such  Independent 
programs  have  no  connections;  one  must  have  Integrated  tools  designed  to  hands  the  oom¬ 
plete  task. 

mrmmmm  of  monomne  iidumdo  tyvdfni  i  wm  fmouiy  od  roofporwng  mm  odii  id  w 
system.  New  ideas,  new  toots,  and  new  needs  oan  suddenly  moke  the  integrated  system  a 

■aJJaiii  aaSft|Me  Alk^k  a  aak  MLkAI 

pmmm  nmdr  ran  i  doraon. 

An  iRdraM  wppivwcn  id  wm  nqrvy  iidyndo  rnonodinc  iooi  mis  s  uw  nrany  unoorarowa 
anarchy  of  some  other  environments.  N  is  easy  to  create  or  add  new  toots  or  reptooe  old  tools, 

M  there  is  Mtle  control  or  standanSztolon  al  the  Mstfaoes  Interfaces  Piet  are  not  tutoe  BecMed 

ero»o  me  mnwvr  w»  MwrrwwMnnm  vs  wm  nwonenw*  n wvi iwvw  ww  mw  ^revwiev 
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can  lead  to  surprising  behavior  when  the  valid  (but  undocumented)  output  of  one  tool  doesn't  fit 
the  specification  (or  the  input  of  a  subsequent  tool.  Also,  growth  and  extension  of  such  anarchic 
toolsets  presents  significant  managerial  problems.  The  major  thrust  in  tool  interfacing  over  the 
next  few  years  should  be  to  develop  a  technology  that  allows  the  following: 

•  condoled  but  uninNbked  growth, 

•  interfacing  between  new  technologies  and  existing  technologies,  and 

•  interfacing  of  relevant  but  Independently  developed  programs  —  within  tasks,  across 
tasks,  and  at  the  supra-task  levels  of  project  management  and  administration. 

It  is  Important  to  remember  that  stronger  type  mechanisms  in  programming  languages  or  better 
data  description  mechanisms  In  conventional  databases  wM  not  be  adequate.  Strong  typing  is 
actually  an  extremely  weak  form  of  semantic  consistency  specification.  To  interchange  infor¬ 
mation  among  diverse  appMcations,  a  stronger  approach  to  semantic  consistency  is  necessary. 
The  database  approach  is  also  syntactic,  since  I  provides  no  intrinsic  mechanisms  that  preserve 
semantic  consistency.  Semantic  consistency  must  be  maintained  by  specifications  and 
mechanisms  outtkto  the  appMcations  programs  that  manipulate  subeets  of  the  information;  other¬ 
wise,  the  complexly  is  Imited,  and  growth  quickly  becomes  Impossbie  because  every  application 
program  must  be  updated  to  maintain  consistency  with  each  new  relation  or  Ms  equivalent.  This 
suggests  that  future  Interface  speclication  development  should  emphasize  more  precise  seman¬ 
tic  specification. 

Interfacing  diverse  tools  wi  become  a  key  problem  In  constructing  sophisticated  software  devel¬ 
opment  environment  technology.  Sources  of  important  ideas  and  programs  or  the  hardware  they 
wMI  use  cannot  be  anticipated;  the  best  or  the  most  appropriate  technology  should  bo  integrated 

as  M  emerges. 


Integration  may  take  the  form  of  spectfytng  and  adopting  standards.  Many  standards  are  in 
place,  but  many  more  need  to  be  specified.  New  tooting  can  be  developed  wMh  these  interface 
standards  In  mind.  However,  older  tooting  and  tooting  that  needs  to  use  Information  In  a  form 
different  than  that  for  which  I  was  developed  (whether  non-standardfeed  Information,  Information 
adhering  to  an  older  standard,  or  Information  in  simply  a  dMersnt  but  standardteed  form)  must  be 
accommodated.  This  can  be  done  by  provkMng  mapping  (unctions  that  transform  information  on 
Input  and for  output  between  the  desired  forms.  In  the  prsssnoa  of  pervasive  Information,  this 
again  demonstrates  the  value  of  hantffng  semantic  consistency  wth  data  spedication  In  an 
actfce  database  rather  than  wlh  the  mapping  programs. 

For  example,  a  simple  "Unix1  pipe*  approach  to  Interfacing  data  In  form  "A"  to  data  in  form  *B* 
autable  for  pmoesslng  via  a  program  "B"  might  be: 


"A"  formal 
information 


AtoB 

transformer 

input 

U"  format 
output 


Htmcba 
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But  the  problem  is  much  more  serious  when  the  scenario  is: 


The  A  format  view  represents  a  way  to  get  the  information  from  the  database,  but  it  is  still  in  a 
database-relative  format  (e.g.,  a  set  of  relations).  The  A-to-B  transformer  must  convert  this  infor¬ 
mation  to  a  suitable  form  for  B  to  manipulate,  and  the  output  from  B  must  be  fed  back  into  the 
database,  possibly  updating  the  Information  that  produced  the  original  A  format  view.  Consis¬ 
tency  and  correctness  of  the  database  must  be  maintained,  even  though  the  A  view  is  a  subset  of 
the  total  information  available  and  other  relations  might  be  affected  by  the  updating  process  as 
the  output  from  B  is  fed  back  into  the  database. 

The  technology  for  dealing  with  these  mappings  is  new;  there  is  now  a  product  available  [5]  that 
handles  some  of  the  remapping  problems,  but  the  deeper  problems  remain. 


3.  Issues 

To  dscuss  the  problems  in  tool  interfacing,  an  overview  is  presented  that  is  intended  to  capture 
the  eesenoe  of  tome  of  the  problems  without  going  into  detail. 

The  basic  goal  of  tool  interface  technology  is  to  make  It  possble  to  interconnect  the  components 
of  a  system  by  providing  a  mechanism  for  passing  Information  among  them.  For  simple  tools  and 
information  structures  this  can  be  quite  straightforward,  e.g.,  a  trigonometric  routine  taking  an 
Input  value  and  producing  an  output  value.  However,  as  the  nature  and  complexity  of  the  Infor¬ 
mation  changes,  simple  mechanisms  are  no  longer  adequate. 

Simple  type  mechanisms  were  introduced  to  attempt  to  maintain  consistency  at  the  interface 
level;  thus,  one  could  not  pass  an  Integer  to  a  procedure  expecting  a  double-precision  real  num¬ 
ber.  Even  this  simple  type  mechanism  is  not  avalable  in  many  languages  that  support  separate 
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compilation  because  the  type  consistency  cannot  be  enforced  across  compilation  boundaries. 
More  modem  languages,  compilers,  and  environments  provide  better  support  (e.g.,  Ada. 
Modula-2  and  C/Int),  but  mechanisms  alone  do  not  suffice  if  they  are  not  used  properly;  for 
example,  one  must  actually  define  and  create  types.  Passing  a  real  number  whose  units  are 
degrees  to  a  procedure  that  accepts  a  real  number  whose  units  are  radians  is  often  valid,  but 
produces  surprising  results. 

Tool  interface  technology  is  weB  developed  in  the  EDP/Database  world.  Various  mechanisms 
alow  applications  programs  to  run  unchanged  in  spite  of  changes  in  the  structure  and  organi¬ 
zation  of  the  underlying  database,  as  tong  as  the  abstractions  required  are  preserved.  Many  of 
these  mechanisms,  however,  cannot  be  applied  generally  to  the  complex  information  processing 
that  is  beginning  to  characterize  programming  and  project  environments. 

Conventional  EOP  databases  tend  to  involve  a  smalt  number  of  scalar  types,  usually  of  fixed  size, 
and  a  small  number  of  relations  easily  expressed  in  terms  of  those  scalar  types.  Although  one 
can  produce  very  complex  structures  in  this  way,  the  structures  are  usually  examined  along 
certain  restricted  dimensions  at  any  one  time.  A  simple  characterization  is  that  there  are  many 
instances  of  a  few  types,  and  any  step  in  the  processing  involves  a  very  small  number  of  relation¬ 
ships  among  these  types.  The  patterns  of  computation  are  almost  always  predictable,  occurring 
at  fixed,  known  intervals  (e.g.,  daily,  weekly,  quarterly),  and  careful  analysis  of  time/space  costs 
based  on  the  known  transaction  style  can  allow  the  system  architect  to  predict  reasonably  accu¬ 
rately  the  throughput  of  the  system. 

The  more  complex  information  of  lifecycle-pervasive  environments  —  those  that  try  to  support  all 
aspects  from  the  requirements  assessment  through  post-deployment  support  —  involves  a  small 
number  of  instances,  each  of  many  types,  whose  fine  structures  are  complex  and  not  of  predeter¬ 
mined  length;  and  there  are  many  relationships  among  the  types.  Furthermore,  the  usage  pat¬ 
terns  are  not  a  priori  determinable,  since  they  depend  upon  particular  project  management  strat¬ 
egies,  needs  for  information,  and  events  that  are  neither  regular  nor  frequent. 

A  new  property  that  these  systems  introduce  to  software  development  is  the  presence  of 
persistent  information,  a  property  wen-known  In  the  EOP  community.  Over  the  Bfetime  of  the 
project,  the  database  must  not  only  support  a  heterogeneous  collection  of  information  (including 
graphs,  program  source,  documentation,  test  data,  customer  reports,  etc.),  but  also  must  be 
available  for  new  toofing  as  It  Is  introduced.  The  classical  colectton  of  text  Wes  organized  by  file 
name  or  dkectory  name  is  not  capable  of  coping  wth  this  class  of  problems,  largely  because  of 
the  unstructured  nature  of  such  information. 

As  more  structure  is  impossed  on  the  information,  the  needs  of  unanticipated  new  technology 
must  be  addressed.  This  technology  wil  also  deal  with  the  information  and  Its  relatione  in  ways 
far  more  complex  than  the  currently  avalabie,  simple  relational  database  models  can  support; 
and  they  must  do  so  efficiently.  Imposing  a  new  structural  layer  onto  an  existing  database  sys¬ 
tem  has  the  potential  of  incurring  unacceptably  high  performance  costs.  Nonetheless,  this  may 
be  the  most  effective  way  (given  current  technology)  to  explore  the  deeper  issues  of  such  a 
structure. 
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4.  Issues  in  Interfacing 

The  foflowing  subsections  represent  a  1st  of  issues  in  interfacing.  For  each  issue,  motivations  for 
their  choice  and  cost/flexibility  trade-offs  are  given.  Some  promising  new  approaches  win  then  be 
dscussed. 

4.1.  Memory  Resident  Interfaces 

These  interfaces  are  characterized  by  data  that  typically  have  a  brief  existence  ranging  from  a 
few  microseconds  (e.g.  a  stack  frame)  to  hours  (a  long  program  run).  Only  a  few  anomalous 
cases  occur,  e.g.,  operating  system  data  structures  that  may  persist  for  months  if  the  hardware 
and  software  are  reliable.  However,  the  interfaces  usually  are  not  transmitted  external  to  the 
program's  address  space.  They  are  usually  recreated  when  the  program  starts  execution  and  do 
not  persist  beyond  the  (normal  or  abnormal)  termination  of  the  program.  Memory  interfaces  are 
also  seen  as  highly  reliable  interfaces  at  the  bit  level;  there  is  rarely  any  error  in  the  transmttal  of 
the  physical  data.  The  interpretation  of  that  data,  of  course,  is  a  different  problem;  strongly-typed 
languages  are  an  approach  to  syntactic  correctness  of  the  information,  but  not  sufficiently  power¬ 
ful  to  guarantee  its  semantic  correctness. 

These  interfaces  are  not  particularly  flex  We;  once  an  instance  of  such  an  interface  strategy  is 
determined  (usually  by  a  compiler),  a  strong  commitment  is  made  to  Its  representation.  It  usually 
cannot  be  changed  without  regenerating  the  system,  e.g.,  recompiling  and  relinking  in  the 
simplest  case.  The  cost  of  a  change  can  be  unacceptably  high  when  a  complex  set  of  inter¬ 
actions  encompassing  several  modules,  plus  acceptance  testing,  is  involved. 

4.2.  Massage  Passing  Interfaces 

These  interfaces  are  characterized  by  a  brief  existence  (the  transmittal  time  of  the  message),  but 
are  usually  transmitted  external  to  the  program's  address  space.  Messages  are  created,  trans¬ 
mitted,  received,  and  destroyed.  Significant  considerations  here  include  the  fact  that  the  infor¬ 
mation  may  be  transmitted  in  a  heterogeneous  environment  and  is  frequently  very  simple  in 
structure.  However,  I  is  usually  assumed  that  the  transport  mechanism  is  unreliable;  and  at 
some  level  of  abstraction,  It  is  no  longer  safe  to  assume  that  a  message  sent  is  a  message 
received. 


The  need  for  portability  often  places  a  limitation  on  the  complexly  of  the  Information  passed 
through  a  message.  Pointers  to  other  data  structures  are  classically  hard  to  encode,  so  what  is 
usually  passed  consists  of  one  or  more  records  or  sequences  of  scalar  values.  However,  scalars 
also  have  their  Imitations  (see  4.4). 

Remote  Procedure  Cal  (RPC)  mechanisms  are  an  interesting  extension,  and  one  that  is  becom¬ 
ing  more  important  in  modem  attributed  computing.  In  RPC,  the  parameter  passing  mechanism 
may  have  to  pats  complex  information  structures;  1 1  passes  them  by  reference  instead  of  by 
value,  addtonal  complications  occur,  and  I  the  structures  contain  pointers  to  other  structures, 
even  more  elaborate  mechanisms  must  be  included  in  the  RPC  mechanism.  RPC  also  has  al  of 
the  oomplcations  engendered  by  message  toss,  receiver  falure,  etc.,  with  addttonal  complica¬ 
tions  of  recovery.  However,  the  power  and  flexfcMty  of  RPC  are  making  I  a  potentially  important 
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replacement  for  some  of  the  more  Hmlted  message  passing  sy~.'ems  in  distributed  environments. 
A  significant  oontrttxjtion  of  RPC  to  programming  methodology  is  that  it  frees  the  user  from  the 
task  of  determining  the  site  of  the  activation.  In  a  fully  general  system  this  may  involve  schedul¬ 
ing  resources,  such  as  finding  an  idle  processor,  in  a  manner  that  is  completely  transparent  to  the 
user. 


4.3.  Persistent  Interfaces 

Persistent  interfaces  are  those  where  the  information  being  passed  along  the  interface  has  an 
existence  quite  independent  of  its  creator.  File  systems  and  databases  are  classical  instances. 
The  lifetime  of  such  data  is  not  only  independent  of  the  creating  process,  but  in  fact  often  ex¬ 
ceeds  the  useful  lifetime  of  the  code  that  constituted  the  creating  process.  Revised  programs 
must  be  able  to  aocess  this  data  without  requiring  reorganization  of  the  information.  Reorgan¬ 
ization  may  incur  either  prohfcitive  cost  or  simply  be  impossible.2  Thus,  representation  independ¬ 
ence,  data  dictionaries,  and  similar  mechanisms  have  arisen  in  the  EDP  community  in  response 
to  a  very  real  set  of  problems.  These  problems  have  been  largely  ignored  in  the  computer 
science  community,  where  persistent  data  may  have  a  lifetime  of  only  weeks  or  months. 

In  programming  environments,  the  information  has  the  quality  of  the  persistent  interface.  In  a 
5-year  project,  it  should  be  possfcle  not  only  to  access  the  requirements  documents  from  which 
the  project  was  created,  but  also  to  provide  annotations,  communication,  feedback,  traces,  etc., 
of  the  current  system  relative  to  those  initial  documents.  Change  log  histories  from  the  beginning 
of  the  project  may  be  needed  and  should  be  accessble.  However,  the  environment  itself  may 
change  over  time,  because  there  are  new  releases  of  tools  or  completely  new  tools  Introduced  in 
the  environment,  or  new  hardware  that  requires  porting  of  the  environment.  None  of  these  events 
should  cause  critical  information  to  be  lost. 

A  number  of  factors  seem  to  preclude  the  use  of  conventional  DBMS  technology  from  maintaining 
this  information.  They  include:  structures  such  as  graphical  data  structures:  program  sources  of 
indeterminate  length;  annotated  post-semantic  syntax  tree  representations  (such  as  structure  edi¬ 
tors  use);  and  the  need  to  establish  relations  at  levels  finer  than  the  gross  "file*  level  (for  example, 
forming  a  relation  between  a  field  bug  report  or  feature  upgrade  request  and  the  line  or  two  of 
code  which  performs  R,  or  the  paragraph  in  the  revised  requirements  document  that  would  reflect 
a  change  in  the  specification.) 

4.4.  Structural  Interfaces 

As  information  becomes  more  complex,  R  is  no  longer  possbie  to  encode  R  effectively  as  simple 
scalars.  Some  mechanisms  which  now  exist  are  text  encodings  of  trees,  dags,  or  general  cyclic 
graphs.  While  allowing  a  general  encoding,  these  mechanisms  can  be  costly.  Notably,  the  cost 
of  encoding  as  text,  writing  text,  reading  and  parsing  text,  and  encoding  text  as  binary  data  can 
be  quite  high.  When  text  is  used  as  a  communication  mechanism  between  tightly  coupled  com¬ 
ponents  of  a  system,  significant  performance  costs  can  be  incurred.  However,  such  mechanisms 
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allow  communication  in  heterogeneous  environments;  for  example,  if  the  text  is  limited  to  the  95 
"printable*  characters  of  the  ASCII  set,  plus  "newline*  or  other  equivalent  punctuation,  such  struc¬ 
tures  can  be  interchanged  between  various  8, 16,  and  32-bit  architectures. 

Such  changes  do  not  occur  without  management  and  development  costs.  The  readers  and 
writers  must  agree  on  the  format  of  the  information;  unless  a  fully  general  mechanism  (such  as 
extended  S-expressions)  can  be  used,  each  writer  and  reader  must  be  individually  handcrafted. 
Changes  in  the  structure  will  then  require  changes  in  all  associated  readers  and  writers.  This  can 
be  a  formidable  management  task.  Even  with  a  fully  general  mechanism,  the  form  and  content  of 
the  resulting  data  structure  must  be  agreed  upon.  Ideally,  existing  code  should  be  reasonably 
impervious  to  change  in  the  presence  of  upward-compatible  changes. 

Regardless  of  these  problems,  the  importance  of  structural  interfaces  is  increasing  as  more  com¬ 
plex  information  must  be  passed  among  system  components.  An  example  of  a  highly-structured 
interface  with  a  textual  representation  is  the  PostScript3  system  [1],  an  interface  designed  for 
the  transmittal  of  complex  multifont  documents. 

4.5.  Impact  of  Interface  Considerations  on  Programming-ln-the-Small 

There  is,  as  usual,  a  trade-off  between  flexibility  and  other  parameters.  For  example,  a  data 
structure  access  of  the  form 

A.B.C  (Ada) 

AA.BA.C  (Pascal) 

A  ->  B  ->  C  (C) 

encodes  very  strongly  the  notion  that  the  B  field  is  a  component  in  the  record  referred  to  by  A  and 
is  found  at  a  distinct  offset  within  that  record.  Pascal  and  C  are  even  more  problematic,  since  the 
programmer  must  also  enoode  the  fact  that  A  is  a  pointer  to  a  record  instance,  and  B  is  a  pointer 
found  within  that  record  (at  a  specific  offset)  which  refers  to  a  record  that  contains  a  C  field.  Such 
programs  contain  no  representation  independence.  Mechanisms  that  create  record  definitions 
from  a  data-dictionary-like  specification  and  require  recompilation  of  the  programs  with  the  new 
definitions  help  only  a  small  part  of  the  problem,  since  there  is  still  a  commitment  to  a  represen¬ 
tation  at  the  source  level. 

Procedural  interfaces  introduce  a  level  of  abstraction;  for  example,  the  interface 
C(B (A)  ) 

simply  constrains  the  B  operation  to  provide  a  piece  of  information  when  applied  to  the  name  A, 
and  the  C  operation,  when  applied  to  this  value,  delivers  the  desired  result.  There  are  those  who, 
with  significant  Justification,  argue  that  this  approach  provides  entirely  too  much  representation 
information,  and  that  the  correct  access  is 

C(A) 

where  the  implementation  decides  (via  its  data  dictionary)  exactly  how  to  find  the  C  information 
when  given  the  object  A. 
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Procedural  interfaces  are  extremely  clumsy  to  write,  and  they  only  solve  the  right-hand- side 
(RHS)  value;  languages  like  Ada  and  Pascal  do  not  permit  procedures  to  return  left-hand-side 
(LHS)  values.  This  means  that  LHS  values  require  some  other  mechanism,  e.g.,  using  'store' 
procedures  for  assignment.  These  do  not  usually  work  when  Va f  (Pascal)  or  'out'  or  'inout'  (Ada) 
parameters  are  required,  and  the  result  is  some  fairly  distorted  code. 

Procedural  interfaces  are  typically  very  expensive  at  runtime.  They  are  usually  Implemented  by 
compilers  as  the  most  general  procedure-call  mechanism.  Very  few  compilers  allow  the  user  to 
specify  that  the  procedure  (defined  in  a  separate  module)  should  be  compiled  inline  or  perform 
such  optimizations  automatically. 

Many  of  these  problems  do  not  arise  in  the  DBMS/EDP  community  because  the  notion  of  pointers 
is  encoded  as  keys  in  relationships.  The  more  general  mapping  shown  above  is  a  common 
pattern:  given  a  tuple  A  and  another  relation  R,  retrieve  the  corresponding  C  data.  However,  as 
indicated  earlier,  the  costs  associated  with  these  are  quite  different,  and  attempting  to  use  such  a 
mechanism  to  manipulate  the  abstract  syntax  tree  in  real  time  in  a  screen-based  structure  editor 
would  not  give  adequate  performance.  This  is  because  the  nature  of  the  relations  and  the  usage 
patterns  of  classical  DBMS  systems  involve  coarser-grained  interaction  on  large  quantities  of 
structurally  Identical  information. 

A  mechanism  that  provides  data  Independence  and  works  naturally  within  the  language,  and 
does  not  incur  severe  cost  is  necessary  at  the  programming  level.  Although  there  are  some 
candidates  for  this,  they  do  not  respond  to  all  of  the  problems. 


5.  Flexibility  Requirements 

There  are  two  extreme  positions  of  tool  integration.  In  one  model,  the  tool  does  everything;  new 
features  are  added  by  integrating  new  components  into  the  tool.  This  rapidly  becomes  self- 
limiting.  By  analogy,  few  people  use  a  Swiss  army  knife,  which  includes  knife  blades  and  scis¬ 
sors.  If  one  adds  a  torque  wrench,  an  oil  fitter  removal  tool,  and  a  small  astronomical  telescope, 
even  fewer  people  would  use  It.  Further,  adding  a  new  tool  to  the  knife  becomes  increasingly 
difficult. 

The  alternate  extreme  is  analogous  to  selling  an  empty  too  box  and  providing  a  tool  catalog. 
While  it  allows  customization,  there  is  substantial  overhead  involved  in  identifying  the  right  tools, 
and  significant  problems  occur  If  the  toolbox  does  not  have  a  space  for  them  (adding  a  chain  saw 
or  two-man  crosscut  to  the  average  toobox  does  pose  certain  technical  difficulties).  Connecting 
the  tools  together,  where  that  analogy  applies,  is  substantially  more  complicated  than  buying  a 
3/8"-to-l/2"  socket  wrench  adapter. 

Somehow,  new  tools  that  interact  in  ways  not  yet  predicted  must  be  accommodated.  In  the  case 
of  future  computing  systems,  a  variety  of  tools  from  all  phases  of  the  project  lifecycle  must  be 
integrated  into  something  that  actually  supports  a  project  across  its  lifetime:  project  planning 
tools,  documentation  tools,  accounting  and  cost  tools,  program  construction  tools,  testing  tools, 
maintenance  support  tools,  and  many  others. 


CMU/SEI-87-TR-7 


A  single  vendor,  tool,  or  machine  cannot  be  expected  to  support  aR  of  these  requirements  or  even 
a  subset  of  them  effectively.  As  technology  becomes  software  driven,  R  will  be  less  important 
which  hardware  is  chosen  since  software  costs  are  now  already  dominating  hardware  costs  (e.g., 
today,  R  is  possible  to  install  a  $40,000  CAO  software  package  on  a  $12,000  computation 
engine.)  Thus,  preparations  must  be  made  to  integrate  programs  into  a  computer  and  also  to 
integrate  the  computers  that  run  those  programs  Into  an  assemblage  of  other,  heterogeneous, 
computers  that  support  various  aspects  of  the  project  Rfecyde. 

i 

|  6.  Potential  New  Technologies 

A  system  recently  coming  into  use  within  the  Ada  community  and  elsewhere  is  the  Interlace 
Description  Language  (IDL)  data  structure  notation,  used  to  specify  the  Diana  representation  for 
Ada  compiler  intermediate  representation  [7].  IDL  provides  both  a  language-independent  struc¬ 
ture  specification  (allowing  interface  to  multilanguage  environments)  and  an  interchange  format 

I  specification;  however,  for  the  latter  case  the  specification  does  not  preclude  highly  optimized 

representations  for  tightly  coupled  systems.  With  certain  careful  engineering  considerations 
taken  into  account,  an  IDL  support  system  incurs  no  more  time  or  space  overhead  than  conven¬ 
tional  language-specific  record  systems;  and  R  can  support  upward-compatible  changes  with  au¬ 
tomatically  generated  or  folly  generic  readers  and  writers  for  a  variety  of  interchange  represen¬ 
tations. 

More  speculative,  but  also  more  promising,  are  systems  based  on  the  object-oriented  model. 
Such  systems  include  Smalltalk  [4, 8, 6. 3],  Actors,  and  Flavors.  In  these  systems,  one  does  not 
so  much  act  upon  data  as  request  data  to  act.  This  shift  In  emphasis  allows  richer  structures  to 
be  built  and  enhancements  to  be  made  over  time  while  maintaining  a  consistent  interface  to  the 
user.  In  addition,  the  notion  of  active  databases,  in  which  the  database  has  responsfoiiRy  for 
maintaining  Rs  consistency  and  integrity  relations  rather  than  the  (distributed)  applcations  code, 
allows  much  more  complex  structures  to  be  buiR.  Systems  such  as  CAIS  demonstrate  that  this  is 
a  highly  promising  direction  for  future  development.  CAIS  (the  node  model)  currently  rests  some¬ 
where  between  the  simpler  structure  representations  and  the  foRy  general  active  database  model. 

For  control,  the  notion  of  remote  activation,  of  which  RPC  is  but  one  Instance,  becomes  Impor¬ 
tant.  Active  databases,  which  are  themselves  dtotributed,  must  be  able  to  Wtiate  activRies  on 
other  machines.  When  they  are  combined  with  object  models  and  active  databases,  more 
flexR>le  and  general  paradigms  can  be  developed.  The  concentration  on  oosts  of  these 
mechanisms  thus  becomes  a  structural  and  organizational  issue  (where,  when,  and  how  to  act 
upon  Information)  rather  than  a  construction  Issue  (the  cost  of  buBdtog  these  mechanisms).  Cer¬ 
tainly  the  various  DBMS  systems  have  accomplished  this  tor  their  problem  domain;  a  phRoaophi- 
caHy  similar  approach  of  developing  basic  mechanism  packages  wRh  general  applicability  needs 
to  be  followed. 
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7.  Rethinking  the  Problem 

Some  of  the  interesting  impRcationa  of  long-term  Information  storage  art  the  raqutramanta  of 


move  away  from  the  database  modal  in  which  raoords,  relatione,  or  values  are  "updated*  (uaualy 
changed-in-piace)  toward  a  modal  in  which  nothino  is  aver  "decanter,  al  versions  of  al  Mor- 
mation  are  preserved  Loglelicjky.  this  would  ental  consuming  nearly  infinite  amounts  of  dtafc 
apace;  pragmaticaly,  one  must  ragutarty  remove  information  to  a  long-term  archive  store.  How¬ 
ever,  R  is  currently  the  case  that  such  "checkpoints*  are  determined  by  administrative  action.  R 
must  become  the  case  ttwt  rherknnlnts  and  eonaislsncv  are  automaticallv  maintained  bv  the 
system,  and  adminiairativa  choices  for  baselne  points  must  be  vaRdated  for  oonatatsncy  As 
systems  grow  in  complexly,  R  becomes  increasingiy  dMicuR  lor  any  one  parson  or  group  to 
maintain  the  consistency  requirements  across  thousands  of  modules  and  mRRons  of  lines  of  coda 
New  approaches  to  the  problems  of  information  storage,  independent  of  aR  other  considerations, 
must  be  considered  Thus,  a  muRiversion,  pervasive  More  may  be  an  active  database,  an  obfact 
database,  or  a  passive  information  database;  R  may  be  accessed  via  hand-coded  Interfaces. 
aufomaticaRy  generated  interfaces,  message  interfaces,  or  whatever  —  but  R  must  be  a  new  way 
of  thinking  about  the  problem  Approaches  such  as  HyperText/HyperOata  [2. 9]  address  many  of 
these  questions,  but  by  no  means  al  of  them.  Considerable  work  remains 


6.  A  Taxonomy  of  lasuas 

The  previous  sections  have  discussed  some  of  the  topics  in  tool  interface  technology  mat  are 
presently  important  to  software  development  environment  technology  In  summary,  these  topics 
involve  the  foHowing  issues: 

•  transient  vs.  persistent  data. 

•  data  vs.  control  issues  (local  procedural  communication  vs.  remote  activation). 

•  strong  typing  (syntactic)  vs.  Interpretation  (semantic),  and 

•  information  structures  passed  (single  word,  fixed  length  text,  variable  length  text! 

RiUClUfwai  POfiiWtai  OOJKw)* 


Each  of  these  represents  a  dVferent  kind  of  interface  problem.  Many  of  the  problems  have 
characteristics  that  make  them  appear  to  be  Information  management  problems  rather  than 
umpw  vvormuon  oonwuurwcmuon. 

9.  Conclusions 

Toot  interfacing  is  one  of  the  core  technologies  that  must  be  understood  and  treated  property  for 
software  development  environment  technology  to  oontinue  to  evoNe  Unfortunately,  tool  inter¬ 
facing  is  not  wefl  understood,  and  the  trade-offs  between  afiemative  interfacing  methods  are  not 
easfly  evaluated.  A  homogeneous  system  is  attractive  tor  obvious  reasons,  but  system 
homogeneity  inNbfis  the  aMfiy  of  software  development  environment  technology  to  evofcre 
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